Skip to content

Conversation

@sophiatev
Copy link
Contributor

@sophiatev sophiatev commented Jul 14, 2025

This PR introduces enabling extended sessions for orchestrations in the .NET isolated framework. The main changes involve adding a check for a result being injected from the middleware in the case of an extended session, as well as additional information to the dispatchContext so that the middleware can correctly execute the extended session. I also made the IsCompleted method of the TaskOrchestrationExecutor public so that it can be used by the dotnet SDK to remove an extended session from its cache upon completion of the orchestration. The majority of the changes for this feature are in the dotnet SDK, which is responsible for actually executing the orchestrations for .NET isolated.

Other PRs:

Open question

I found this existing TODO and agree that this section of code looks incorrect and dangerous. It basically checks if it is able to acquire the extended session lock (which means that maxConcurrentSessions has not yet been reached at that moment in time), and if it is able to, it sets isExtendedSession to true and immediately releases the lock. I think the more correct logic is that we are in an extended session whenever OnProcessWorkItemAsync is called from within the loop of the OnProcessWorkItemSessionAsync call. That is why I moved the workItem.IsExtendedSession = true statement there. If an extended session ends, then we break out of the loop, and will reload the workItem from storage upon the next orchestration execution, in which case workItem.IsExtendedSession will be false by default.

I removed this section of code in this PR, but the question is - does the CorrelationTraceClient.Propagate call need to be replaced with something else now? What is its point? To log something upon the start of every extended session?

                CorrelationTraceClient.Propagate(
                    () =>
                    {
                        // Check if it is extended session.
                        // TODO: Remove this code - it looks incorrect and dangerous
                        isExtendedSession = this.concurrentSessionLock.Acquire();
                        this.concurrentSessionLock.Release();
                        workItem.IsExtendedSession = isExtendedSession;
                    });

Manual testing done thus far

  • If extended sessions are not enabled, everything still proceeds as expected (the orchestration history is replayed upon every execution)
  • If an extended session expires, a history is sent along with the orchestration request to the worker. The worker replays the orchestration and then saves the state in a new extended session (state is only saved if the orchestration is not completed)
  • Upon completion of an orchestration, the extended session state is evicted from the cache
  • If the worker ends the extended session before the host does (this should be very rare but can happen if e.g. there is a network delay when sending the orchestration request to the worker, or something along those lines), then it will throw a SessionAbortedException. The work item will then be retried again.
  • Other OOProc scenarios still work as expected (they do not have extended sessions enabled so the orchestration is replayed upon every request. If they attempt to enable extended sessions an error occurs upon startup indicating that the extended session feature does not work for these other languages).

Performance testing

Two scenarios were run in in-process with extended sessions enabled/disabled, and in isolated with extended sessions enabled/disabled. Multiple trials were run, and the time it took to complete the orchestration was recorded as well as the number of times the history was loaded in each of these settings (with extendedSessionIdleTimeoutInSeconds set to 30 seconds). The two scenarios tested were

  • Fan out to a large number of tasks (30,000) and then fan-in. For both isolated and in-process, without extended sessions this took about 35 minutes and with extended sessions around 25 minutes. In the isolated model there were consistently two history loads with ES enabled, with the second corresponding to a SessionAbortedException being thrown (so the worker ends the extended session before the host, and as expected, throws the exception which leads to a retry of the work item, this time with a history attached). In in-process, there was just one history load. Without ES, there were about 100 history loads in each.
  • Run a large number of sequential tasks (1000) with large payloads (10,000 character-length strings). For both isolated and in-process, without extended sessions this took about 30 minutes and with extended sessions enabled around 5 minutes. Without ES there were around 100 history loads, and with ES just 1. Interestingly, in isolated this scenario actually ran somewhat faster both with/without ES as opposed to in-process.

@cgillum
Copy link
Member

cgillum commented Jul 17, 2025

I removed this section of code in this PR, but the question is - does the CorrelationTraceClient.Propagate call need to be replaced with something else now? What is its point? To log something upon the start of every extended session?

I'm pretty sure we can just remove this code and not replace the logic. The person who wrote it originally likely didn't really understand what they were doing, and this got missed (my fault) during that PR review.

FYI, the CorrelationTraceClient.Propagate(...) calls support the legacy distributed tracing implementation, which has since been replaced by DTv2. I think we still need to keep the DTv1 stuff around, but I'm pretty sure the code with the TODO can be safely deleted without affecting anything.

@sophiatev sophiatev force-pushed the stevosyan/extended-sessions-for-orchestrations-isolated branch from 7bd7837 to 403829a Compare August 9, 2025 00:04
@sophiatev sophiatev merged commit 35318d1 into main Sep 17, 2025
44 checks passed
@sophiatev sophiatev deleted the stevosyan/extended-sessions-for-orchestrations-isolated branch September 17, 2025 16:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants